arXiv:1504.07336vl [math.ST] 28 Apr 2015 


Draft of April 29, 2015. 


Information content of partially rank-ordered set samples 

Armin HATEFif and Mohammad Jafari JozANjil’t 

t The Fields Institute for Research in Mathematical Sciences & Department of Statistical Sciences, University of Toronto. 

1 Department of Statistics, University of Manitoba, Winnipeg, MB, Canada, R3T 2N2. 

Abstract: 

Partially rank-ordered set (PROS) sampling is a generalization of ranked set sampling in which rankers are not required to 
fully rank the sampling units in each set, hence having more flexibility to perform the necessary judgemental ranking process. 
The PROS sampling has a wide range of applications in different fields ranging from environmental and ecological studies to 
medical research and it has been shown to be superior over ranked set sampling and simple random sampling for estimating the 
population mean. We study the Fisher information content and uncertainty structure of the PROS samples and compare them 
with those of simple random sample (SRS) and ranked set sample (RSS) counterparts of the same size from the underlying 
population. We study the uncertainty structure in terms of the Shannon entropy, Renyi entropy and Kullback-Leibler (KL) 
discrimination measures. Several examples including the FI of PROS samples from the location-scale family of distributions 
as well as a regression model are discussed. 
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1 Introduction 

Ranked set sampling is a powerful and cost-effective data collection technique which can be used to obtain 
more representative samples from the underlying population when a small number of sampling units can 
be fairly accurately ordered with respect to a variable of interest without actual measurements on them 
and at little cost. It is assumed that the exact measurement of the variable of interest is very costly 
but ranking sampling units is cheap. Ranked set sampling has many applications in industrial statistics, 
environmen tal and ecological studies as we ll as medical research. Some recent exam ples include estimating 


phytomass (IMuttlak and McDonaldl. 


flock management (jOzturk et al 


1^9921 ) , stream habitat area (jMode et all Il999l ) , mean and variance in 


20051 ) as well as studying the associatio n between smoking exp osure and 


three important carcinogenic biomarkers in a lung cancer decease study (|Chen and Wand . 


2004) and in a 


fishery research for estimating the mean stock abundance using the catch-rate data available from previous 
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years as a concomitant variable (|Wang et al. 


2009|). For recent o vervi ews of the theory and applications 


of ranked set sampling and some of its variations see 


Wolfe 

(2012 

) and 

Chen et al. 

o 

o 

CN 


To obtain a ranked set sample (RSS), an initial simple random sample (SRS) of size k is taken. These 
units are ordered, but without actually being measured; we call this judgement ranking, which may be 
perfect or imperfect. Upon ranking, only the smallest unit is measured. Following this, a second SRS of size 
k is taken, ranked and the second smallest unit is measured. This process is repeated until the largest unit 
in a SRS of size k has been measured. In this process, the ranker is asked to declare unique ranks for each 
unit inside the sets. There are many situations where it is difficult to rank all of the sampling units in a set 
with high confidence, particularly when subjective information is utilized in the ranking process. Forcing 
rankers to declare unique ranks can lead to inflated within-set judgment ranking error and consequently 
to invalid statistical inferenc e. Part i ally ra nk-ordered set (PROS) sampling design is a generalization of 


ranked set sampling, due to 


OzturkI poill ). which is aimed at reducing the impact of ranking error and 


the burden on rankers by not requiring them to provide a full ranking of all the units in each set. Under 
PROS sampling technique, rankers have more flexibility by being able to divide the sampling units into 
subsets of pre-specified sizes. These subsets are partially rank-ordered so that each unit in subset h has a 
rank smaller than the ranks of units in su bset h for all h > h. An observation is then collected from one 

(j2015l l used PROS sampling design to e stimate the parameters 


of these subsets in each set. 


Hatefl et al. 


of a finite mixture model to analyze the age str ucture o f a fish species 


mean estimation using PROS sampling design. 


Frevi (|2012l l studied nonparametric 


OzturkI ([20131) proposed statistical procedures that utilize 


PROS data from multiple observers to assist in the selection of units for measurement in a basic ranked 
set sample design or to construct a judgment post-stratified design. 

In this paper, we study information and uncertainty content of PROS samples. To this end, in Section 
[21 we provide a formal description of PROS sampling and present some preliminary results on distributional 
properties of PROS samples. In Section [3l we obtain the Fisher information (FI) content of PROS samples 
and show that it is more than the FI content of its SRS and RSS counterparts of the same size. Several 
examples including the FI of PROS samples from the location-scale family of distributions as well as a 
simple linear regression model are also discussed in this section. In addition, the effect of subsetting 
errors when applying PROS sampling design on the FI content of samples is explored. In Section 01 we 
study information and uncertainty of PROS samples using the Shannon entropy, Renyi entropy and KL 
information measures and compare them with their SRS and RSS counterparts. Finally, in Section [5l we 
give some concluding remarks. 
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2 Preliminary results on distributional properties of PROS samples 


To obtain a PROS sample of size n, we choose a set size S and a design parameter D = {di,... ,dn} 
that partitions the set {1,... ,5} into n mutually exclusive subsets. First, S units are randomly selected 
and are assigned into subsets dr,r = 1,... ,n, without actual measurement of the variable of interest and 
only based on visual inspection or judgment, etc. Then a unit is selected at random for measurement 
from the subset di and it is denoted by Selecting another S units assigning them into subsets, a 

unit is randomly drawn from subset ^2 and then it is quantihed and denoted by This process is 

repeated until we randomly draw a unit from dn resulting in This constitutes one cycle of PROS 

sampling technique. The cycle is then repeated N times to generate a PROS sample of the size Nn, i.e. 
{X(dr)i'X = l,...,n;i = 1,...,A^}. Table [U shows the construction of a balanced PROS sample with 
S = 6, n = 2, N = 2 and the design parameter D = {^ 1 ,^ 2 } = {{1; 2, 3}, {4, 5,6}}. Each set includes six 
units assigned into two partially ordered subsets. This partial ordering provides the information that the 
units in di have the smaller ranks than units in d 2 - In this subsetting process we do not assign any ranks 
to units within each subset so that these units are equally likely to take any place in the subset. One unit, 
in each set from the bold faced subset, is randomly drawn and is quantihed. The fully measured units are 
denoted by Xi^dr)i^ r = 1, 2; i = 1, 2. 


Table 1: An example of PROS design 


cycle 

set 



Subsets 

Observation 

1 

Si 

Di = 

= {di, ^ 2 } 

= {{1,2, 3}, {4, 5, 6}} 

X(di)i 


S 2 

D 2 -- 

= {di, ^ 2 } 

= {{1,2, 3}, {4, 5, 6}} 

X{d2)l 

2 

Si 

Di = 

= {di, ^ 2 } 

= {{1,2,3},{4,5,6}} 

X(di)2 


S 2 

D 2 = 

= {di, d2} 

= {{1,2, 3}, {4, 5, 6}} 

X{d2)2 


Throughout the paper, without loss of generality, we assume that N = 1 (unless otherwise specihed) 
and we use PROS(n,S') to denote a PROS sampling design with the set size S, the number of subsets n 
and the design parameter D = {dr,r = 1,..., n} where dr = {{r — l)m + 1,..., rm}, in which m = S/n 
is the number of unranked observations in each subset. We note that RSS and SRS can be expressed as 
special cases of the PROS(n, S) design when S = n and S' = 1, respectively. 

Suppose X is a continuous random variable with probability density function (pdf) f{x]6) and cu¬ 
mulative density function (cdf) F{x;6), where 6 is the vector of unknown parameters with 6 € Let 
X-pros = {X(^dr)X = 1,..., n} be a perfect PROS(re, S) sample of size n from The PROS data 
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likelihood function of 0 is given by the joint pdf of ^pros as follows: 

n f 

L{0\Xpros) = fiXpros]0) = j X] (^K) 1 

r=l I u£dr 

where d) is the pdf of the u-th order statistic of a SRS of size S from /(•; 6). For each dehne 

the latent vector = (A('^’')(u), w £ dr = {(r — l)m + 1,..., rm}), where 





1 if X(^rir) is selected from the u-th position within the subset dr] 
0 otherwise, 


with = 1. Denote Ypj-os = {(A(d,.), r = as the complete PROS data 

consisting of and their corresponding latent vectors r = 1,... ,n. The complete PROS data 

likelihood function of 6 using the joint pdf of Xpros is given by 

L{e\ypros) = f{ypros]0) = H 11 j f ' (1) 

r=l u&dr ^ ^ ^ 


Furthermore, by summing the joint distribution of A^'^’’)) over A^'^’’^ = the marginal distri¬ 

bution of is obtained as follows 

/(i,)(i(*);e) = E = ^,T, f"“"'’(Hdry,e)- (2) 

S(d,r) U£dr 


Also, one can easily check that 


n 

-'^f{dr)ix;d) = fix]0). 

71 

r=l 

In addition, the conditional distribution of given is 


u&dh 


V&dr 




( 3 ) 


( 4 ) 


3 FI content of PROS samples 


In this section, we hrst obtain the FI content of Ypj-os, the complete PROS data, and derive analytic results 
to compare it with the FI content of SRS and RSS data of the same size. We give examples regarding the 
location-scale family of distributions as well as a simple linear regression model. Then, we study the FI 
content of Xp^os by modelling an imperfect PROS design involving misplacement errors in the subsetting 
process. The FI of PROS samples can play a key role in its theory and application to study the asymptotic 
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behaviour of the maximum likelihood estimators of 6 as well as the derivation of the Cramer-Rao lower 


bound for unbiased estimators of 6 or some of i ts function s 
Under the usual regularity conditions (e.g., IChen et al. 


aased on PROS samples. 


2004), the FI matrix is calculated by 1(0) = 


—log/(X; 0)], provided the expectation exists, where refers to the Fth derivatives of the log- 
likelihood function with respect to 0 with Dg = Dq. For any two matrices A and B of the same size, 
we use > 0 and ^ > R to indicate that A and A — B are non-negative definite matrices. We also let 
4>uW = {u — 1) I(A = 0) + {S — u) I{X = 1) with A G {0,1}, u = 1,..., 5, where I is the usual indicator 
function. 


3.1 FI matrix of complete PROS data Y, 


pros 

Here we obtain the FI matrix of Ypros under perfect subsetting assumption. To do so, we need the following 
useful result. 

Lemma 1. Suppose with pdf /(^^)(-;0), is observed from a continuous distribution with pdf 

/(•;0) and cdf F{-]0), respectively, using a PROS(n, S') design. Let be the latent variable associ¬ 
ated with For any A € {0,1} and any function G{-), 

I r —1 u^df' 

subject to the existence of the expectations. 


Proof. Let A = 0. By the total law of expectations and equation (jlj) we get 

S^^-\u)G{Yr) \ 


r=l uGdr 


F(y,; 


I = 1V V (u -1) [ 


f^^'-^\x-,e)dx 


V — 1 


= ^l G(x)/(x;0)||](u-l)f‘^ ])[F{x-,e)r-^[F{x-e)f-^Ux 

= n(5- 1)E[G(X)], 


The proof for A = 1 is similar and hence is omitted. 

Now, we obtain the FI content of Ypros and compare it with its SRS counterpart of the same size. 


□ 


Theorem 1. Under the usual regularity conditions (e.g., 
PROS(n,S') sample of size n from f{-]0) is given by 


Chen et al. 


20041 ) , the FI matrix of a complete 


(0) — Isrs (0) + IC(0), 
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where Isrs(^) denotes the FI matrix of a SRS of size n, 


K(0) = n{S-1)E 


[DgF{X-e)][DgF{X-,e)]'^ 


F{x-e)F{x-e) 

is a non-negative definite matrix and the expectation is taken with respect to X. 

Proof. Let r = 1,..., n. Using ([1]), the log-likelihood function of 6 can be written as 

iprosi^) = CSt Tp(0), 

where cst = nlog{n(^~^)} is a constant with respect to 6 and 

n 1 


r=l uddr A=0 


and —E[Dgl*^g{0)] = Taking second derivatives of rp(0) with respect to 0, one gets 


DlT,(e) = Y.EE't'-‘ 

r=l uGdr A=0 

Using Lemma dl we have 


u 


{-l)^D^gF{yr-, 9) [DgFivr-, e)][DgFiyr-, 9)] 


T 


A + (1 - 2A) F{yr-, 9) [A + (1 - 2A) F(y,; 9)f 




r=lu£dr 


r=l u£dr 

= S{S -l)E{DlF{X-9)} . 


F{Yr,9) 


Similarly, by Lemma [H we obtain 


E EE^- 


r=l u£dr 


[DgF(Yr-,9)][DgFiYr;9)] 
[A + (l-2A)F(y,;0)]2 


T 




Taking expectation of Dgrp{9) and from ([5]) and ([6|), we obtain 

Km = -E[«2r,(»)l = nis - ^^IV^enx-.emeFixmr 

which completes the proof. 


F{X-9)F{X-9) 


(5) 


( 6 ) 

(7) 

□ 


Theorem [T] shows that the FI matrix of the complete PROS(n,S') sample can be decomposed into the 
FI matrix of the SRS data and a non-negative definite matrix, hence lprosi9) > lsrs(0). In other words, 
complete PROS sample provides more inf ormati o n abo ut th e unknown parameters 9 than SR S of the same 


size. It is worth noting that the result of 


Chen ( 2000 1 and Barabesi and El-Sharaawi (2001) about FI of 


RSS data can be obtained a special case of Theorem [T] by setting S = n. We now compare the FI content 
of the complete PROS sample with that of RSS of the same size about the unknown parameters 9. 


6 

















Theorem 2. Under the conditions of TheoremUl the FI matrix of a complete PROS(n, 5) sample may 
be decomposed as 

Ipros (0) = (6>)+M(6»), 

where Irss(^) is the FI matrix of an RSS of size n (when the set size is n), and 

is a non-negative definite matrix. 


Proof. Using Theorem [T] for S = n, we have 


lrss(0) 


Isrs(^) + n{n - 1)E 


[D0F{X-,e)][DeF{X-,e)]'^ 

F{x-e)F{x-e) 


where denotes the FI matrix of a SRS of size n. Now, the result follows from the above equation 

and the expression for Ipros(^) in Theorem [TJ □ 


Theorem [2] shows the superiority of a complete PROS sample over an RSS of the same size in terms of the 


FI content about the unkno wn vector of parameters 0. 


data to that of SRS data, 


Barabesi and El-Sharaawi 


n com paring the Fisher information content of RSS 


20011 ) considered the example of point estimation 


within a location-scale family and the example of linear regression. We use the same two examples to 
obtain the FI content of a complete PROS data set from the location-scale family of distributions as well 
as a simple linear regression model and compare them with those based on SRS and RSS data of the same 
size. To this end, let 


REi{e) 


det\^pj.Qs (^)} 

det{lsrs{0)} 


and RE 2 { 0 ) 


det\^pj.Qs (^)} 

det{lrss{0)} ' 


From Theorems [T] and [2] one can notice that the set size {S) and the number of the subsets (n) are two 
important parameters of PROS(n, S') design that influence the FI content of PROS samples. We observe 
that increasing S and n results in a considerable gain in REi and RE 2 , respectively. Also, both REi and 
RE 2 increase with the number of the parameters of the model. Later in this section we investigate the 
case where the set sizes are fixed in both PROS and RSS designs and consider the effect of the number of 
subsets in PROS sampling design on the FI content of PROS data compared with their RSS counterparts. 


Example 1. (Location-Scale family of distributions) . Under the assumptions of Theorem [I], if 
f{x; 0) is a member of the location-scale family of distributions with pdf 

fix; 6) = e = iy,a)GRxR+, 

(7 a 


7 










where g{‘) is a pdf with corresponding cdfG{‘), then 


'^pros (0) — 'S-srs {e) + K{e) 


( 


a 


p r g (^) 1 


^ 9{ZY J 


E{ 




+ 


n(5-l) I E{ 


gjz? 


rl E{, 


Zg(Z? 


} 


cr^ 


g{ZY J L 

If f{x] 6) is symmetric about the location parameter p, the FI matrix reduces to 


G(Z)[l-G(Z)]f ^\G(Z)[l-G(Z)] 

p r Zg{zY I p r Z^g{zY I 

Ig{Z)[i-G{Z)\S ^yG(Z\\\-G(Z')}i 


-G{Z)[l-G{Z)]- 


n 


Ipros(^) — 2 




- g (^) 

g(2 
0 


pr g 1 


Similar to 


p r -^^g (■^) 


0 


- 1 } 


+ 


n 


{S-l) I E{ 


g(z? 




G{Z)[1-G[Z)] 

0 


} 


E{ 


0 

Z^gjzf 

G{Z)[1-G{,Z)\ 


} 


Barabesi and El-Sharaaw: 


1 20 Oi) who compared the relative efficiency of RSS to SRS for some 


members of the location-scale family of distributions, Tables{^ shows the values of REi and RE 2 under the 
same distributions. As expected, the largest values of REi and RE 2 are achieved in the cases where both 
location and scale parameters are considered to be unknown. 


Table 2; The values of REi{0), i = 1,2 for comparing the FI content of the complete PROS(n, S) sample with its SRS and 


RSS of the same size for some distributions. 


Distributions 

Location 

Scale 

Shape 

REi 

RE 2 

Exponential 

0 

cr 

- 

l-b0.4041(S' - 1) 

l+0.4041{,^„f„--i_^} 

Normal 


1 

- 

l-b0.4805(S - 1) 

l+0.4805{,V,^--i_,)} 


0 

cr 

- 

l-b0.1350(S - 1) 

l+0-1350{iVil5oU-i)} 


n 

cr 

- 

14-0.6155(5 - 1) 4-0.0649(5 - 1)^ 

, . /0.6155(S-n)-|-0.0649[(S-l)^-(n-l)2] N 
l+0.6155(n-l)+0.0649(ra-l)^ > 

Logistic 

n 

1 

- 

14-0.0050(5 - 1) 

l+0.1666{p 3332_,_o.i666(n-l) } 


0 

cr 

- 

14-0.1513(5 - 1) 

1-40.2149{ I.4i8g_|_0.2149{n-1) } 



cr 

- 

1-40.6516(5 - 1) -40.0757(5 - 1)^ 

.. 1 /0.3081(S-n)-|-0.0358[(S-l)^-(n-l)^] N 

0.4728-1-0.3081{n-l)-|-0.0358(n-l)^ > 

Extreme-value 


1 

- 

14-0.4041(5 - 1) 



0 

cr 

- 

14-0.2519(5 - 1) 

l+0-2518{ 4^2518^1-1)} 



cr 

- 

14-0.6012(5 - 1) -40.0686(5 - 1)^ 

l+0.6560(n-l)+0.1017(n-l)4 > 

Gamma 

0 

cr 

2 

1-40.4393(5 - 1) 

l-40.7296{ 1 ,6609-1-0.7296{n-l) } 


0 

cr 

3 

14-0.4523(5 - 1) 

1-41.1690{ 2 gg46_,_i I690{n-1) } 


0 

cr 

4 

14-0.4591(5 - 1) 

l-4l'6161{ 3 5200-1-1.6161{n-l) } 


0 

cr 

10 

1-40.4718(5 - 1) 

1-44.2396{ g gg20-|-4.2396(n-l) i 


Example 2. (Linear Regression Model). In this example, PROS(n,S) sampling design is applied 
to the simple regression model T* = /3o + fiiXi + Ci with replicated observations of the response vari¬ 
able where for each value Xi of independent variable, i = l,...,k, we have a PROS sample of Y’s 








































denoted by ... , 0- Fo r more details about the us e of R SS sampling in this regression model, see 

Barreto and Barneti ll99!A) and lBarabesi and El-Sharaawi 1(200 A) . Suppose e* are independent and iden- 
tieally distributed random variables from a symmetric distribution with pdf f(^-) and cdf F{-), respeetively. 
Let E{ei) = 0 and Var{ei) = cj^. Without loss of generality, we take x = ^ Yli=i = 0; i J2i=i 
and let 9 = (/3o,/3i,a). Using Example\^ it is easy to show that 

( pr / 1 q-.jrs 

k f{zv X 

lsrs(0) = '^~2 


2 = 1 


xfE{E^} 


\ 


iW 
0 


7iW 


0 


E{ 


0 

Z2/(Z)2 

~7{W~ 






2 J ’-f ^ 




and 


K{e) = ^ 


2n(5- 1) 


2=1 


(J^ 


E{^} x,E{I^} 

XiE{^^} x}E[X^} 

0 0 E{iq^} j 


-f{zf 


0 

0 


A^oie that REi{6) is independent of Xi and 6 and it only depends on the pdf f{-) and the corresponding cdf 
F(-). As a special case, when CiS are normally distributed, one can easily show that 

REi{e) = {1 + 0 . 4805(5 - 1 )}^ {1 + 0 . 1350(5 - 1 )} . 


When S = n, we obtain the result of 
our results. 


Barabesi and El-Sharaawi 1200 A) for RSS data as a speeial case of 


3.2 FI matrix of Xp^os and the effect of misplacement errors 

In this section we obtain the FI matrix of Xp^os. We study a setting when it is assumed that the subsetting 
process of PROS(n, 5) design could be subjected to misplacement errors between the subset groups. For 
example, when the actual rank of a unit is in the judgment subset dr, due to judgment ranking error it could 
be misplaced into another judgment subset, say dg, r ^ s, which leads to a different kind of ranking error 
than the one usually encountered in ranked set sampling. Note that the FI matrix of ^pros under perfect 


subsetting assumption can also be o btained as a special c ase o 


the missing data model proposed by 


the imperfect subsetting scenario. We use 


Arslan and OzturkI (|2013l i to model possible misplacement errors in 


PROS sampling design. Let Xp^os = = 1,... ,n} denote an imperfect PROS sample where [•] is 

used to show the presence of misplacement errors in PROS subsetting process. When the subsetting process 
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is perfect we simply use to show PROS observations. Let a denote the misplacement probability 

matrix, 

^di,di ^di,d2 ■ ■ ■ ^di,dn 

0:d2,dl 0!d2,d2 ■ ■ ■ 

a = 

0'dn,d2 ■ ■ ■ CX.dn,dn 

where is the misplacement probability of a unit from subset dh into subset dr- Since the design 

parameter D creates a partition over the sets, the matrix ex should be a double stochastic matrix such that 
I]r=i (^dr,dh = Ylh=i ^dr,dh = 1- Suppose /[d^](-; 9) is the pdf of X[d^], r = 1,..., n. One can easily show 
that 


where 


fldr]{x[drY^9) = '^ad,,df^f(d,,){x[drY^) = f {x[drY, 9r{x[d,]; 9), 


h=l 


grix;9) = 


") = " E E 0 e)l“-‘|i - r(i-; 9)1 

h=l uGdh ^ ^ 

The likelihood function under an imperfect PROS (n. S') design is now given by 


S-u 


( 8 ) 


(9) 


L{^) = '[lf[dr]ixidrY9) = Ylfixid,Y^)9r{x[d,Y^), 

r=l r=l 

where Yl = (9, a). To obtain the FI matrix of an imperfect PROS sample and compare it with its SRS 
and RSS counterparts we need the following result, the proof of which is left to the reader. 

Lemma 2. Let Yr = X^dr]j r = 1,... ,n, be observed from a continuous distribution with pdf /(•; 9) using 
an imperfect PROS(n,5) sampling design. S uppose f[^,](- ; |9) an d gr{-,9) are defined as in ([8]) and (|^. 
respectively. Under the regularity conditions of IChen et al.l (|2004l i . we have 


(i) Ylr=ifldr]ix;^)=nfix-,9), 

Ylr=i9r{x;9) = n, 

/.,.t v-n t:. f IDggr(Vr;0)]ID0gr(Yr;0}Y ] _ ^ f \Dggr{X-,0)]\Degr{X-,0)Y } 

M -/ “ ^r=l^\ --/• 

Now, we show that the FI content of Xp^os is more that its SRS counterpart. Unfortunately, it is hard to 
obtain analytical results to compare the FI content of PROS and RSS data, therefore, we should rely on 
numerical studies for this case (see Tables [3] and [4]) . 
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Theorem 3. Under the conditions of Lemma the FI matrix of an imperfect PROS(n,5') sample about 
unknown parameters O = (a, 0) is given by 

n 

= Ilsr-g(^) + 

r=l 

where X]r=i ® non-negative definite matrix. 

Proof. The proof is similar to the proof of Theorem [T] and hence it is omitted. □ 


To stndy the effect of misplacem ent errors in the subsetting 


content of the sample, following 


p roces s of PROS(n, S) design on the information 


Barabesi and El-SharaawiI (120011 1. we consider the following misplacement 


probabilities matrices when n = 2 and n = 3, 


cxi = 




i-p i-p 

p 1 — p 


P 2 2 

and CX 2 = 

1-p „ 

2^2 

_l-p p 


l-p 1-p 



2 2 P . 


For some members of the location-scale family of distributions, numerical values of REi{6) and RE2{9) 
are calculated to compare the FI content of imperfect PROS samples with their SRS and RSS counterparts 
of the same size when S' = 6 and S = 12. These values are reported in Tables [3] and [H respectively. The 
results are calculated through a Monte Carlo simulation study comprising of 50,000 replications. Both 
tables show that misplacement errors in the subsetting process of PROS sampling have considerable effect 
on the information content of PROS data about the unknown parameters of the model. Note that, when the 
subsetting process is done randomly, i.e., p = 1/2 when n = 2 and p = 1/3 in the case n = 3, the FI content 
of PROS samples is the same as the FI content of SRS and RSS data of the sa me size. Similar results for 
compa ring the FI content of imperfect RSS and SRS samples can be found in 

mm- 


Barabesi and El-Sharaawi 


Now, we investigate the effect of PROS sampling parameters S and n on the FI content of PROS samples 
compared with their RSS counterparts. To this end, we first calculate the FI content of two ranked set 
samples with fixed set sizes 6 and 12 when the cycle size is 1, under both perfect and different imperfect 
ranking scenarios. The FI content of RSS samples are then compared with that of PROS samples under 
different values of S, n and N, where N is the number of cycles in order to match the number of PROS 
observations with their corresponding RSS samples. Under some members of the location-scale family of 
distributions, Tables [7] and [8] provide the values of RE2{9) for the sample sizes 6 and 12, respectively, 
where the subsetting and ranking error probability matrices are defined following the same structure used 
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in Oil and 0.2 with proper adjustments to the off-diagonal elements for the set size. For example, consider 
the case where S = 6, n = 3,1 = 2 in Table [71 In this case, RSS design with set size S' = 6 is compared 
with the PROS design with set size S = 6, each consisting of three subsets n = 3 of equal sizes m = 2. 
Since the PROS design results in 3 observations (as opposed to RSS that results in 6 observations), PROS 
sampling is replicated with two cycles 1 = 2. The relative efficiency values are simulated through a Monte 
Carlo study with 50,000 replications. From Tables [7] and [HI it is at once apparent that sampling parameters 
S and n as well as ranking (subsetting) error models play key roles on the information content of PROS 
data about unknown parameters of the model. As noted earlier, one observes that the performance of 
PROS(n, S) and RSS coincides when S = n. We also note that for fixed set size S (in both RSS and PROS 
design) and under moderately accurate ranking in RSS design, some PROS samples carry less information 
than RSS of the same size about the parameter of the underlying population. However, the difference 
between the information content of PROS and RSS data diminishes as n increases to S. One may also 
observe more informative PROS samples than RSS data of the same size (even with a larger set size than 
that of PROS design) when the ranking error in RSS design is large. 


3.3 FI using the Dell and Clntter model for misplacement ranking errors 


Here, we propose two-stage Monte Carlo simulations to study the effect of mi splacement ranking erro r 


models on the FI content of PROS samples. Following the model proposed in 


Dell and Clutter! (1972), 


in the first stage we compute the misplacement probabilities of PROS and RSS designs. In the second 
stage, these misplacement probabilities are used to compute the FI content of PROS and RSS sampling 
designs. Using the Dell and Clutter model for p = 1,0.9,0.75,0.5,0.25 (representing different degrees 
of association between the ranking covariate and the response variable), the first stage computes the 
misplacement probabilities matrices (a, = 1,...,5) for each p through simulations of size 5000. Using 
the estimated misplacement probabilities, in the second stage, we compute the FI content of the PROS, 
RSS and SRS sampling designs through Monte Carlo simulations comprising of 50,000 replicates. The 
results of the simulation studies for different family of distributions ( like previous simulation studies) are 
reported in Tables 0 and O To explore the effect ranking errors on different distributions, we also computed 
the FI content of PROS samples under four different mixture of two univariate exponential distributions 
= 7rae“"'^ + (1 “ vr)/3e“^'^, x > 0, wher e tt g (0,1 ), a, 13 > 0 and = (vr,®,/?). To handle 


the mixture of exponential distributions, following 


Hill! ()l963l ). we calculated the numerical values of the 


relative efficiencies. To do so, a new parameter h = ^ is introduced and the exponential mixture model 
with three parameters {Tr,a,f3) is transformed to a mixture density with two parameters (vr, h). 

In the next section, we study the uncertainty structure (as another aspect of information content) of PROS 
samples in terms of some well-known measures including Shannon entropy, Renyi entropy and KL information. 
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Table 3: Values of REi and RE 2 to compare the FI content of imperfect PROS data with its SRS and 
RSS counterparts of the same size for some distributions when S' = 6. 


P 


Distribution 

n 

Design 

0 

0.1 

0.2 

0.3 

0.4 

0.5 

0.6 

0.7 

0.8 

0.9 

1 

Normal 

2 

REi 

2.48 

1.67 

1.34 

1.14 

1.03 

1.000 

1.03 

1.14 

1.34 

1.67 

2.48 



RE 2 

1.47 

1.25 

1.14 

1.06 

1.02 

1.000 

1.02 

1.06 

1.14 

1.25 

1.47 


3 

REi 

1.82 

1.28 

1.08 

1.004 

1.02 

1.11 

1.28 

1.54 

1.94 

2.54 

3.78 



RE 2 

1.28 

1.11 

1.03 

1.002 

1.01 

1.04 

1.11 

1.19 

1.28 

1.38 

1.54 

Exponential 

2 

REi 

1.93 

1.47 

1.24 

1.10 

1.02 

1.000 

1.02 

1.10 

1.24 

1.47 

1.93 



RE 2 

1.37 

1.20 

1.11 

1.05 

1.01 

1.000 

1.01 

1.05 

1.11 

1.20 

1.37 


3 

REi 

1.47 

1.18 

1.05 

1.003 

1.01 

1.07 

1.18 

1.35 

1.58 

1.90 

2.44 



RE 2 

1.18 

1.08 

1.02 

1.001 

1.01 

1.03 

1.07 

1.13 

1.19 

1.26 

1.36 

Logistic 

2 

REi 

2.73 

1.78 

1.39 

1.16 

1.04 

1.000 

1.04 

1.16 

1.39 

1.78 

2.73 



RE 2 

1.58 

1.30 

1.17 

1.08 

1.02 

1.000 

1.02 

1.08 

1.17 

1.30 

1.58 


3 

REi 

1.88 

1.31 

1.09 

1.005 

1.02 

1.12 

1.31 

1.61 

2.06 

2.74 

4.14 



RE 2 

1.32 

1.12 

1.04 

1.002 

1.01 

1.05 

1.12 

1.21 

1.31 

1.43 

1.61 


Table 4: Values of REi and RE 2 to compare the FI content of imperfect PROS data with its SRS and 
RSS counterparts of the same size for some distributions when S = 12. 


P 


Distribution 

n 

Design 

0 

0.1 

0.2 

0.3 

0.4 

0.5 

0.6 

0.7 

0.8 

0.9 

1 

Normal 

2 

REi 

3.15 

1.96 

1.48 

1.20 

1.05 

1.000 

1.05 

1.20 

1.48 

1.96 

3.15 



RE 2 

1.87 

1.46 

1.26 

1.12 

1.03 

1.000 

1.03 

1.12 

1.26 

1.46 

1.87 


3 

REi 

2.51 

1.49 

1.13 

1.007 

1.03 

1.18 

1.46 

1.90 

2.56 

3.58 

5.74 



RE 2 

1.77 

1.29 

1.09 

1.005 

1.02 

1.11 

1.26 

1.46 

1.68 

1.93 

2.32 

Exponential 

2 

REi 

2.39 

1.69 

1.35 

1.15 

1.04 

1.000 

1.04 

1.15 

1.35 

1.69 

2.39 



RE 2 

1.70 

1.38 

1.21 

1.10 

1.02 

1.000 

1.02 

1.10 

1.21 

1.38 

1.70 


3 

REi 

1.85 

1.31 

1.09 

1.005 

1.02 

1.12 

1.30 

1.57 

1.93 

2.43 

3.30 



RE 2 

1.48 

1.19 

1.06 

1.003 

1.01 

1.08 

1.18 

1.31 

1.45 

1.61 

1.82 

Logistic 

2 

REi 

3.56 

2.14 

1.57 

1.24 

1.06 

1.000 

1.06 

1.24 

1.57 

2.14 

3.56 



RE 2 

2.06 

1.56 

1.32 

1.15 

1.04 

1.000 

1.04 

1.15 

1.32 

1.56 

2.06 


3 

REi 

2.72 

1.55 

1.15 

1.008 

1.03 

1.20 

1.53 

2.04 

2.81 

4.04 

6.65 



RE 2 

1.90 

1.33 

1.10 

1.005 

1.02 

1.13 

1.30 

1.53 

1.79 

2.09 

2.56 


Nevertheless, it is worth mentioning that the FI and uncertainty content play important roles in different inferential 
aspects of the PROS sampling designs including, for instance, maximum likelihood (ML) estimation and its properties. 
The FI matrix is a key concept in th e theory of statistical inference particularly in the theory of ML estimation 


problem (ILehmann and Casellal . 


1998ri . It is used to derive asymptotic distribution of MLE and to calculate the 
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covariance matrices associated with ML estimates as well as Bayesian Statistics. 


Table 5: Values of REi and RE 2 to compare the FI content of imperfect PROS data with its SRS and RSS counterparts of 
the same size for some distributions based on different Dell-Clutter parameters when S € {6,12}. 


Distribution 

n 

Design 



S=6, p 





S=12, p 



0.25 

0.50 

0.75 

0.90 

1.00 

0.25 

0.50 

0.75 

0.90 

1.00 

Normal 

2 

REi 

1.02 

1.10 

1.27 

1.51 

2.48 

1.03 

1.13 

1.36 

1.68 

3.15 



RE2 

1.00 

1.02 

0.98 

0.96 

1.47 

1.01 

1.04 

1.05 

1.06 

1.87 


3 

REi 

1.03 

1.15 

1.43 

1.85 

3.70 

1.04 

1.20 

1.57 

2.16 

5.75 



RE2 

1.01 

1.02 

1.02 

1.05 

1.50 

1.01 

1.07 

1.12 

1.23 

2.32 

Exponential 

2 

REi 

1.02 

1.06 

1.18 

1.31 

1.92 

1.02 

1.08 

1.22 

1.44 

2.38 



RE2 

1.00 

1.00 

0.99 

0.96 

1.37 

1.01 

1.02 

1.03 

1.06 

1.69 


3 

REi 

1.03 

1.11 

1.30 

1.56 

2.47 

1.03 

1.14 

1.37 

1.73 

3.44 



RE2 

1.00 

1.02 

1.02 

1.04 

1.35 

1.01 

1.05 

1.07 

1.16 

1.89 

Logistic 

2 

REi 

1.02 

1.10 

1.31 

1.55 

2.69 

1.04 

1.16 

1.40 

1.78 

3.54 



RE2 

1.00 

1.00 

1.01 

0.96 

1.56 

1.01 

1.06 

1.08 

1.10 

2.05 


3 

REi 

1.03 

1.16 

1.49 

1.95 

4.13 

1.04 

1.21 

1.64 

2.28 

6.76 



RE2 

1.00 

1.01 

1.04 

1.06 

1.60 

1.01 

1.06 

1.15 

1.24 

2.62 

Vlt = (tt, h) 

2 

REi 

1.04 

1.13 

1.44 

1.80 

3.83 

1.05 

1.21 

1.51 

2.02 

5.11 

(0.3,1/3) 


RE2 

1.01 

0.98 

0.95 

0.90 

2.36 

1.02 

1.04 

1.00 

1.00 

3.16 


3 

REi 

1.06 

1.22 

1.59 

2.07 

4.51 

1.06 

1.23 

1.65 

2.30 

6.44 



RE2 

1.01 

1.02 

1.00 

1.00 

1.91 

1.02 

1.03 

1.04 

1.11 

2.73 

4^ = (tt, h) 

2 

REx 

1.02 

1.10 

1.26 

1.48 

3.39 

1.03 

1.10 

1.25 

1.50 

5.25 

(0.3,1/9) 


RE2 

1.00 

0.99 

0.92 

0.85 

2.10 

1.00 

0.98 

0.91 

0.86 

3.25 


3 

REi 

1.03 

1.14 

1.41 

1.72 

4.45 

1.04 

1.15 

1.42 

1.71 

7.59 



RE2 

1.00 

0.99 

0.97 

0.92 

1.97 

1.01 

1.00 

0.97 

0.92 

3.35 

4' = (tt, h) 

2 

REi 

1.05 

1.19 

1.50 

2.02 

3.67 

1.05 

1.22 

1.61 

2.16 

4.34 

(0.9,1/3) 


RE2 

1.01 

1.02 

0.97 

0.91 

2.15 

1.01 

1.05 

1.04 

0.98 

2.54 


3 

REi 

1.04 

1.20 

1.56 

2.13 

5.24 

1.06 

1.25 

1.70 

2.39 

6.60 



RE2 

1.00 

1.01 

0.97 

1.01 

2.05 

1.02 

1.05 

1.06 

1.13 

2.59 

4^ = (tt, h) 

2 

REi 

1.02 

1.09 

1.23 

1.46 

2.85 

1.03 

1.11 

1.27 

1.57 

3.57 

(0.9,1/9) 


RE2 

1.00 

0.98 

0.91 

0.83 

1.74 

1.01 

1.00 

0.94 

0.89 

2.18 


3 

REi 

1.03 

1.12 

1.36 

1.76 

4.33 

1.04 

1.16 

1.44 

1.86 

6.95 



RE2 

1.00 

1.00 

0.97 

0.98 

1.84 

1.01 

1.03 

1.02 

1.03 

2.96 


4 Other Information Criteria 


The concept of information and uncertainty of random samples is so rich that several measures have been proposed 
to study different aspects of these concepts. For example, in the Engineering studies, the Shannon entropy, Renyi 
entropy and KL information measures are used more than FI to quantify the information and uncertainty structures of 
random samples. These measures quantify the amount of uncertainty inherent in the joint probability distribution of 
a random sample and have been applied in many areas such as ecological studies, computer sciences and information 
technology, in different contexts includi ng order statistics, spacings, cens ored data, reliability, life testing, record data 


and text analysis. For more details see 


Jafari Jozani and Ahmadil l|2014l) and ijohnsoni (|2004il and references therein. 


In this section, we compare the Shannon entropy, Renyi entropy and KL information of PROS data with SRS 
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Table 6: Values of RE 2 to compare the FI content of imperfect PROS(n, S) with imperfect RSS of a fixed set size S € {6,12} 


under different Dell-Clutter Model 


Distribution 

S 

n 

N 



S = 6, 

P 


S 

n 

N 


5 

= 12, 

P 


0.25 

0.50 

0.75 

0.90 

1.00 

0.25 

0.50 

0.75 

0.90 

1.00 

Normal 

4 

2 

3 

0.97 

0.88 

0.73 

0.58 

0.39 

6 

2 

6 

0.97 

0.84 

0.61 

0.40 

0.16 


6 

2 

3 

0.98 

0.89 

0.75 

0.60 

0.44 

6 

3 

4 

0.98 

0.90 

0.70 

0.49 

0.25 


6 

3 

2 

0.99 

0.94 

0.86 

0.75 

0.67 

12 

2 

6 

0.97 

0.87 

0.67 

0.45 

0.21 


8 

2 

3 

0.98 

0.90 

0.78 

0.62 

0.49 

12 

3 

4 

0.98 

0.91 

0.77 

0.57 

0.39 


12 

2 

3 

0.99 

0.91 

0.82 

0.68 

0.56 

12 

4 

3 

0.99 

0.95 

0.83 

0.68 

0.55 


12 

3 

2 

0.99 

0.98 

0.93 

0.86 

1.02 

12 

6 

2 

1.00 

0.99 

0.92 

0.85 

0.81 


12 

6 

1 

1.01 

1.03 

1.11 

1.26 

2.04 

12 

12 

1 

1.00 

1.01 

1.01 

1.03 

1.03 

Exponential 

4 

2 

3 

1.00 

1.00 

1.02 

1.02 

1.03 

6 

2 

6 

0.96 

0.84 

0.65 

0.52 

0.35 


6 

2 

3 

0.98 

0.90 

0.78 

0.69 

0.65 

6 

3 

4 

0.97 

0.87 

0.73 

0.62 

0.45 


6 

3 

2 

0.99 

0.95 

0.88 

0.82 

0.83 

12 

2 

6 

0.96 

0.86 

0.68 

0.57 

0.44 


8 

2 

3 

0.98 

0.91 

0.80 

0.71 

0.70 

12 

3 

4 

0.97 

0.91 

0.76 

0.67 

0.62 


12 

2 

3 

0.98 

0.92 

0.82 

0.75 

0.80 

12 

4 

3 

0.98 

0.94 

0.84 

0.76 

0.76 


12 

3 

2 

0.99 

0.97 

0.92 

0.90 

1.14 

12 

6 

2 

0.99 

0.97 

0.91 

0.87 

0.88 


12 

6 

1 

1.01 

1.04 

1.09 

1.16 

1.61 

12 

12 

1 

1.00 

1.00 

0.99 

0.99 

0.99 

Logistic 

4 

2 

3 

0.97 

0.88 

0.71 

0.56 

0.36 

6 

2 

6 

0.96 

0.82 

0.59 

0.38 

0.16 


6 

2 

3 

0.97 

0.89 

0.72 

0.58 

0.43 

6 

3 

4 

0.98 

0.88 

0.69 

0.47 

0.24 


6 

3 

2 

0.99 

0.94 

0.85 

0.74 

0.66 

12 

2 

6 

0.96 

0.87 

0.64 

0.43 

0.20 


8 

2 

3 

0.97 

0.92 

0.76 

0.61 

0.49 

12 

3 

4 

0.99 

0.90 

0.74 

0.55 

0.38 


12 

2 

3 

0.98 

0.93 

0.80 

0.65 

0.56 

12 

4 

3 

0.99 

0.92 

0.81 

0.66 

0.54 


12 

3 

2 

0.99 

0.99 

0.91 

0.87 

1.08 

12 

6 

2 

0.99 

0.96 

0.88 

0.81 

0.75 


12 

6 

1 

1.00 

1.03 

1.13 

1.26 

2.12 

12 

12 

1 

1.00 

1.00 

1.00 

0.98 

0.99 


Table 7: Values of RE2 to compare the FI content of imperfect PROS(n, S) with imperfect RSS of a fixed set size 6. 


P 


Distribution 

S 

n 

N 

0 

0.1 

0.2 

0.3 

0.4 

0.5 

0.6 

0.7 

0.8 

0.9 

1 

Normal 

4 

2 

3 

1.75 

1.49 

1.25 

1.03 

0.83 

0.67 

0.56 

0.48 

0.42 

0.38 

0.37 


6 

2 

3 

2.03 

1.63 

1.33 

1.05 

0.83 

0.67 

0.56 

0.49 

0.44 

0.42 

0.43 


6 

3 

2 

1.49 

1.24 

1.07 

0.93 

0.82 

0.74 

0.69 

0.66 

0.64 

0.63 

0.65 


6 

6 

1 

1.00 

1.00 

1.00 

1.00 

1.00 

1.00 

1.00 

1.00 

1.00 

1.00 

1.00 


8 

2 

3 

2.23 

1.73 

1.38 

1.07 

0.84 

0.67 

0.56 

0.50 

0.46 

0.45 

0.47 


12 

2 

3 

2.59 

1.92 

1.48 

1.11 

0.85 

0.67 

0.57 

0.52 

0.49 

0.49 

0.54 


12 

3 

2 

2.09 

1.46 

1.13 

0.93 

0.83 

0.79 

0.80 

0.82 

0.86 

0.91 

1.01 


12 

6 

1 

1.22 

1.03 

1.01 

1.08 

1.20 

1.36 

1.51 

1.66 

1.80 

1.93 

2.05 

Exponential 

4 

2 

3 

1.51 

1.34 

1.18 

1.03 

0.89 

0.77 

0.69 

0.63 

0.59 

0.56 

0.56 


6 

2 

3 

1.71 

1.44 

1.23 

1.05 

0.89 

0.77 

0.69 

0.64 

0.61 

0.61 

0.64 


6 

3 

2 

1.33 

1.17 

1.05 

0.95 

0.88 

0.83 

0.80 

0.79 

0.79 

0.80 

0.82 


6 

6 

1 

1.00 

1.00 

1.00 

1.00 

1.00 

1.00 

1.00 

1.00 

1.00 

1.00 

1.00 


8 

2 

3 

1.89 

1.54 

1.28 

1.07 

0.90 

0.77 

0.70 

0.65 

0.64 

0.65 

0.70 


12 

2 

3 

2.11 

1.65 

1.34 

1.09 

0.90 

0.77 

0.70 

0.67 

0.66 

0.69 

0.78 


12 

3 

2 

1.66 

1.30 

1.09 

0.96 

0.89 

0.87 

0.88 

0.91 

0.96 

1.02 

1.11 


12 

6 

1 

1.14 

1.02 

1.00 

1.05 

1.14 

1.24 

1.34 

1.42 

1.50 

1.56 

1.62 

Logistic 

4 

2 

3 

1.89 

1.57 

1.30 

1.04 

0.83 

0.66 

0.55 

0.47 

0.42 

0.38 

0.37 


6 

2 

3 

2.25 

1.73 

1.38 

1.07 

0.83 

0.66 

0.55 

0.48 

0.44 

0.42 

0.44 


6 

3 

2 

1.55 

1.27 

1.08 

0.93 

0.82 

0.74 

0.69 

0.67 

0.65 

0.65 

0.67 


6 

6 

1 

1.00 

1.00 

1.00 

1.00 

1.00 

1.00 

1.00 

1.00 

1.00 

1.00 

1.00 


8 

2 

3 

2.50 

1.87 

1.45 

1.10 

0.84 

0.66 

0.56 

0.49 

0.46 

0.46 

0.49 


12 

2 

3 

2.94 

2.08 

1.56 

1.14 

0.85 

0.66 

0.56 

0.51 

0.50 

0.51 

0.58 


12 

3 

2 

2.23 

1.51 

1.14 

0.93 

0.83 

0.80 

0.81 

0.84 

0.89 

0.96 

1.07 


12 

6 

1 

1.23 

1.03 

1.01 

1.09 

1.22 

1.39 

1.57 

1.73 

1.89 

2.03 

2.17 
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Table 8: Values of RE 2 to compare the FI content of imperfect PROS(n, S') with imperfect RSS of a fixed set size 12. 


P 


Distribution 

S 

n 

N 

0 

0.1 

0.2 

0.3 

0.4 

0.5 

0.6 

0.7 

0.8 

0.9 

1 

Normal 

6 

2 

6 

2.24 

1.67 

1.17 

0.78 

0.52 

0.36 

0.27 

0.21 

0.18 

0.16 

0.16 


6 

3 

4 

1.62 

1.27 

0.94 

0.68 

0.51 

0.40 

0.33 

0.29 

0.26 

0.24 

0.24 


12 

2 

6 

2.86 

1.98 

1.31 

0.82 

0.52 

0.36 

0.27 

0.22 

0.20 

0.19 

0.21 


12 

3 

4 

2.27 

1.49 

1.00 

0.69 

0.51 

0.43 

0.38 

0.36 

0.35 

0.35 

0.37 


12 

4 

3 

1.78 

1.23 

0.89 

0.69 

0.59 

0.53 

0.50 

0.49 

0.49 

0.50 

0.52 


12 

6 

2 

1.31 

1.05 

0.88 

0.79 

0.74 

0.72 

0.71 

0.70 

0.71 

0.71 

0.73 


12 

12 

1 

1.00 

1.00 

1.00 

1.00 

1.00 

1.00 

1.00 

0.99 

0.99 

0.99 

0.99 

Exponential 

6 

2 

6 

1.80 

1.46 

1.14 

0.86 

0.66 

0.53 

0.44 

0.39 

0.36 

0.35 

0.36 


6 

3 

4 

1.38 

1.18 

0.97 

0.79 

0.65 

0.56 

0.51 

0.47 

0.46 

0.45 

0.46 


12 

2 

6 

2.23 

1.68 

1.24 

0.90 

0.67 

0.53 

0.44 

0.40 

0.39 

0.40 

0.44 


12 

3 

4 

1.78 

1.33 

1.01 

0.79 

0.66 

0.59 

0.56 

0.56 

0.57 

0.59 

0.64 


12 

4 

3 

1.46 

1.15 

0.93 

0.80 

0.72 

0.69 

0.68 

0.68 

0.70 

0.72 

0.75 


12 

6 

2 

1.19 

1.03 

0.93 

0.87 

0.84 

0.83 

0.84 

0.85 

0.86 

0.87 

0.89 


12 

12 

1 

1.00 

1.00 

1.00 

1.00 

1.00 

1.00 

1.00 

1.00 

1.00 

1.00 

1.00 

Logistic 

6 

2 

6 

2.41 

1.76 

1.21 

0.78 

0.51 

0.35 

0.25 

0.20 

0.17 

0.16 

0.16 


6 

3 

4 

1.69 

1.30 

0.95 

0.67 

0.50 

0.39 

0.32 

0.28 

0.26 

0.24 

0.24 


12 

2 

6 

3.14 

2.11 

1.36 

0.83 

0.51 

0.35 

0.26 

0.22 

0.19 

0.19 

0.20 


12 

3 

4 

2.43 

1.55 

1.00 

0.68 

0.50 

0.42 

0.37 

0.36 

0.35 

0.36 

0.38 


12 

4 

3 

1.87 

1.25 

0.89 

0.69 

0.58 

0.53 

0.51 

0.50 

0.51 

0.52 

0.55 


12 

6 

2 

1.33 

1.05 

0.88 

0.79 

0.74 

0.72 

0.71 

0.72 

0.72 

0.73 

0.75 


12 

12 

1 

1.00 

1.00 

1.00 

1.00 

1.00 

1.00 

1.00 

1.00 

1.00 

1.00 

1.00 


and RSS data of the same size. Throughout this section, the subsetting process of PROS design and the ranking 
process of RSS are assumed to be perfect. 


4.1 Shannon Entropy of the PROS sample 

Let X be a continuous random variable with pdf /(•, 0). The Shannon entropy associated with X, is defined as 


H{X;e) = - J f{x-,e)\ogf{x]e)dx, 


subject to the existence of the integral. The Shannon entropy, as a quantitative measure of information (uncertainty), 
is extensively used in information technology, comput er science and other engineering fields. In practice, smaller 


Johnson 


values of the Shannon entropy are more desirable (see 
size n is given by 

n „ 

il„(Xsrs;6>) =-^ J f{x;9)\ogf{x;9)dx = nH{Xi-,e). 

i—1 

Similarly, for an RSS of size n (with the set size n) 

n p 

H„{Xrss;9) = -Y.j f^^'-"Hx-,0)logf^--^\x-9)dx, 


20041) . The Shannon entropy content of a SRS of 


i=l ' 


where /^*’"^(-; 9) is the pdf of the z-th order statistic in a SRS of size n from /(•; 9). Furthermore, for a PROS(n, S) 
sample, it is easy to see that 


n p 

Hn {~^pros ; o) = -T.J f{dr)(.y;S)^ogf(^d^){y-9)dy. 
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In the following lemma, we show that the Shannon entropy of PROS data is smaller than that of SRS data of the 
same size. Unfortunately, we were not able to obtain an ordering relationship among the Shannon entropy of RSS 
and PROS data of the same size. Instead, we obtain a lower bound for the Shannon entropy of a PROS(n, S) sample 
in terms of the Shannon entropy of an RSS data of size S when the set size is S. 


Lemma 3. Let ^pros be a PROS(n, S') sample from a population with pdf 6) and let m = S/n he the number 
of observations in each subset. Suppose Xsrs a SBS of size n from f{-;9) with the Shannon entropy HnfXsrs] 9) 
and HsfXrss] 9) represent the Shannon entropy of an RSS of size S when the set size is S. Then, 


—Hs{Xrss', 9) < Hn[Xpros\9) < Hn{Xsrs\ 9), for all n€ N. 
m 

Proof. Using ([S]) and convexity of h{t) = t\ogt,t > 0, we have 

H„iXpros;9) < l^log 

= Hn{Xsrs',9). 

Furthermore, using ([2]) and convexity of h{t) = t\ogt,t > 0, we have 


■^f{d,.)(.x]9) 


r—1 


dx 


Hr,{Xpros;9) = (-J2 log 


r—1 ' 


U^dr 


- 

m 


u^dr 


dx 


> - EE f^^'-^\x-,9)\ogf^^--^\x-,9)dz 


r—1 uGdr 


-Hs{Xrss;9), 


which completes the proof. 


□ 


4.2 Renyi entropy of PROS data 

In this section we use the Renyi entropy as a quantitative measure of the entropy associated with PROS data Xp^os. 
The Renyi entropy of a random variable X with pdf /(•; 9) is defined as follows 

H4X;9) = -^\ogE[r-\X;9)], 

L — a 

where a > 0,a ^ 1. The Renyi entropy is a very general measure and includes the Shannon entropy as its special 
case due to the following relationship 

\im Ha{X; 9) = — [ f {x', 9) log f{x;9)dx = H{X] 9). 
a^l J 

Due to the flexibility of the Renyi entropy, Ha{X;9) has been used in many fields such as statistics, ecology, 
engineering and etc. We derive the Renyi entropy of Xpros and compare it with the Renyi entropy of Xg^s- We 
present the results for 0 < a < 1 and the case with a > 1, which requires further investigation, will be presented in 
later works. To this end, the Renyi entropy of a SRS of size n is given by 

1 " r 

Ha,n{Xsrs;9) = -^-—'^log J /“ (xi; 6>) dxi = u (Xi; 0) ; 
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and for an RSS with set size n, 


j [/(*-) (x; 0 )]“dx. 

Also, for a PROS(n, S') sample, one gets 

H^A^pros-,e) = Y^E^og y'[/(d^)(x;0)]“dx. 

Lemma 4. Let Ha^n{^pros',9) represent the Renyi entropy of a PROS(n, S) sample of size n from a population 
with pdf /(■; 9). Suppose Xg^s CL'nd be a SRS of size n and an RSS of size S (with the set size S) from /(•; 0), 
respectively. For any 0 < a < 1 and all n G N, we have 


-Ha,s{^rss'^9) < Ha^nO^pros',9) < F[a^nO^srs] 9). 


Proof. By using © and the concavity of the functions hi{t) = logt and h 2 {t) = we have 


li 

^Oi,n{p^prost E Z 

1 — I 




r—1 \ u£dr 


< 


■log 


1 


1 — a 

— Ha.,n{~^srst ■ 


dx 


r=l u^dr 


Similarly, one can show the following inequalities 


Ha,n{^pros] 9 ) > ^ ^ E ~ E 


U^dr 


1 

^ ' r—1u^dr 


'■^\x;9)]°^dx 

'^'■^'>{x-9)]°‘dx 


1 


iS„,5(X;gg;0), 


which complete the proof. 


□ 


4.3 KL Information of the PROS technique 

The Kullback-Leibler (KL) discrepancy is another measure which can be used to quantify the information regarding 
a random phenomenon by comparing two probability density functions corresponding to a random experiment. 
Consider two pdfs /(•; 9) and g{-; 9). The KL information measure based on /(•; 0) and g{--, 0) is defined by 

m 9) = //«;«) log 

which quantifies the information lost by using g{-',0) for the density of the random variable X instead of f{-\0). 
In this section, using the KL measure we make a comparison among PROS sampling, simple random sampling 
and ranked set sampling designs to determine which design provides more informative samples from the underlying 
population. To this end, we use 

K {Lpros{9\y), Lsrs{9\y)) = j) Lpros{9\y) log ^ (20) 
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to compare PROS(n, 5”) and simple random sampling designs, where Lpros(^|y) and Lsrs(^\y) denote the likelihood 
functions of PROS and SRS data of the same size, respectively. The KL information measure for comparing ranked 
set sampling and simple random sampling is defined similarly by using (fTUl) and setting S = n in PROS sampling 
design. One can interpret (1101) in terms of a hypothesis testing problem within the Neyman-Pearson log-likelihood 
ratio testing framework (see 


Johnson 


200411 . 


Lemma 5. Let LprosiSly) and Lsrs{d\y) denote, respectively, the likelihood functions o/a PROS(u. S') sample and 
a SRS of size n from a population with pdf /(•; 6). Then we have 

'fid^y^oy 

Proof. To show the result, using OT we have 


n p 

K {Lpros{0\y),L,rsi0\y)) J /(d,)( 2 /; 0) log 


f{y,d) 


dy. 


n p ( n 

K {.Lpros{P\y)i Lsrs{P\y^') — s/ n f{dn){yh]0) \ log 

r=l'' J 


f{dr.){Vr\0) 

fiyye) 


Y[dyj 


= log ( 


r—1 ' 


f{y,o) 


where the last equality follows from the independence of observations and the fact that n — 1 of the integrals are 

1 . □ 

In the following lemma, we show that KL information distance between the likelihoods of PROS aird SRS sampling 
designs is greater than the one between the likelihoods of two SRS sampling designs. Hence, PROS data are more 
informative than SRS data about the underlying population. We also obtain a lower bound for the KL information 
between the likelihoods of PROS and SRS data of the same size. 

Lemma 6. Let Lpros{(^\y) denote the likelihood function o/a PROS(n, S') sample from a population with pdf f 0). 
Suppose Lsrs,i{0\y) and Lgrs, 2 id\y) denote the likelihood functions of simple random samples of size n from f{-',0) 
and g{-; 6), respectively. In addition, let L^ss* (^iy) represent the likelihood function of a RSS of size S when the set 
size is S. Then, 


K {Lsrs,i{0\y),Lsrs,2(0\y)) < K {Lpros{ 0 \y), Lsrs,2(0\y)) < — RT |y),^ 87 - 8 , 2 (^ly)) • 

Proof. Applying Lemma [5] and using the convexity of h{t) = tlogt, t > 0, we derive 


K {Lpros{0\y),L,rs,2{d\y)) = ^ / 5 (?/; ( 

r=l ^ 


f(dr){y]d)\, (f(dr){x-,e) 




giy\0) 

.fiyoy 


log 


9{y,d) 

^T,7=if{d,.){y,o) 

9{y,s) 


dy 


K{L,rs,l{0),LsrsAe)), 
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which shows the first inequality. Similarly, 


- pyir. o) (l E E 


v—1 ‘ 


g{y,d) 


= —K (Lrss*{.0\y),Lsrs,2{0\y)) , 
m 


which completes the proof. 


□ 


5 Concluding Remarks 


In this paper, we have considered the information content and nncertainty associated with PROS samples from 
a population. First, we have compared the FI content of PROS samples with the FI content of SRS and RSS 
data of the same size under both perfect and imperfect subsetting assumptions. We showed that PROS sampling 
design results in more informative observations from the underlying population than simple random sampling and 
ranked set sampling. Some examples are presented to show the amount of the extra information provided by PROS 
sampling design. We have then considered other information and uncertainty measures such as the Shannon entropy, 
Renyi entropy and the KL information measures. Similar results have been obtained under the perfect subsetting 
assumption. It would naturally be of interest to extend these results to imperfect subsetting situations. The results 
of this paper suggest that one might be able to obtain more powerful tests for testing hypothesis or model selection 
problems based on PROS data. For example, it seems promising to develop goodness of fit tests based on PROS data 
under KL information me asure. We believe that further investigation of PROS sampling design under the missing 
information criterion as in iHatefi and Jafari Jozanil (|2013^ is of interest and appealing as well. 
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Appendix: 

FI of unbalanced PROS and the effect of misplacement errors 

In this section, we study the FI matrix of the unbalanced PROS sampling design in a general setting when the 
subsets are allowed to be of different sizes. To obtain an unbalanced PROS sample , we first need to determine the 
sample of size K and set size S. Judgment sub-setting process is then applied to create K sets. We group these K 
sets into N cycles Gi = {S'l,!,..., i = 1,..., A^, where '^i = Let Dr,i = {dr[i]ij ■ ■ ■, dr[ni]i} be the 

design parameter associated with set Sr,i, where dr[i]i; I = 1,..., is the /-th judgment subset in the set Sr,i- In 


20 








each cycle Gi]i = 1, ..., iV, we randomly select a unit from one of the sets (particularly from the judgment subset 
dr[r]i',i^ = for full measurement, say and the number of unranked units in subset dr[r]i is denoted 

by rriri; r = 1,rn; i = 1,N. To this end, the collection of measured observations r = 1,... ,ni]i = 

1,... , iV} is an unbalanced PROS sample of size K = Table [9] illustrates the construction of an unbalanced 

PROS sample of size of RT = 5 with set size S = 6 and cycle size TV = 2 so that in the first cycle we declare three 
subsets ni = 3 and two subsets n 2 = 2 of different sizes in the first and second cycl es, respectively . In each set, rriri 
represent the number of unranked units in the selected subset. For more details see lOzturkI (j201lh . 


Table 9; An example of unbalanced PROS design when S = 6,K = 5,TV = 2,ni = 3,n2 = 2 and niri represents 
size of the selected subset in each set. 


cycle 

set 

Subsets 

TYlri 

Observation 

1 

*51,1 

= {di[i]i, di[2]i) c?i[3]i} = {{I5 2, 3}, {4, 5}, {6}} 

3 



5'2,1 

^2,1 = {<^2[l]l) ci2[2]l) '^2[3]l} = {{!) 2, 3}, {4, 5}, {6}} 

2 

Erf2]l 


^3,1 

-C*3,l = {c^3[l]l) '^3[2]l! <^3[3]l} = {{!) 2, 3}, {4, 5}, {6}} 

1 

^[da]! 

2 

^1,2 

Di ^2 = {f^i[i]2) '^i[2]2} = {{I5 2}, {3,4, 5,6}} 

2 



82,2 

-^2,2 = {c^2[1]25 <^2[2]2} = {{Ij 2}, {3, 4, 5, 6}} 

4 

^[d2]2 


We fist present the following result. 

Lemma 7. Let Yri = be an observation from unbalanced PROS sampling design from a continuous distribution 

with pdf f{-]6). With the knowledge of the design parameter Dr^i, the pdf of Yri given by 

fir;mrdy^0) = -^ E 

''^ri ^ , 

where 0) is the pdf of the v-th judgment order statistics between S data. 

Proof. For each Yri define the latent vector -y g dr[r\i), where 

1 if Yri is selected from the w-th position within the subset dr[r]i] 

0 otherwise, 

with = 1. The joint pdf of (Wi, is given by 

rriri 

r^lvedr[r]i ^ 

Furthermore, by summing the joint distribution of (Tri, over A^'^'’!® = ^[‘^'■1®, the marginal distribution of Yri 

is obtained as follows 

5[<lr]« ®’® v^dr[r]i 

□ 
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Using Lemma [71 the likelihood function under an unbalanced PROS design is now given by 

N Ui N n. 


m) = nn/k;™.d(y-;^)=nn 


1 


m. 


a\, 




i—1 r—1 

N Tli 

nn ^ ^ ^ 

i=l rJi I ved^M, h=l 


ESI: . 


( 11 ) 


where Q = {0,a), is the pdf of the u-th order statistics and in a similar vein to Subsection 13.21 

is considered as the misplacement probability of a unit from subset into subset so that ‘^ldr,dh,]i = 

Sr=i <^[dr,dh]i = 1; i = 1, ■ • ■, 7^. Similarly, one can re-write the likelihood function (fTT|l as follows 


N Ui 


N Tli 


L {^)=n n iyr^^, ^)=n n griivn-, o), 


i—lr—1 


i—1 r—1 


where 


9ri{y;0)^^ E “K.dh]*— I )[^(2/;^)]“ Ml-■P'(y;^)] 


S—u 


( 12 ) 


Similar to Subsection 13.21 to obtain the FI matrix of an unbalanced PROS sample and compare it with its SRS and 
RSS counterparts one can easily obtain the following result. 

Lemma 8. Let Yr^i = r = 1,... ,ni]i = 1,..., N, be observed from a continuous distribution with pdf /(•; 9) 

using an unbalanced PROS sampling design. S uppose /p-m , ,] (s 9) and gri{-]9) are defined as in Lemma^and (1121) . 


Chen et al. 


( 2004 ). we have 


respectively. Under the regularity conditions of 

Eti E”li = Eti E {Dlg„{X, 0)} 


/AA\ TP \ [^Q9ri{X[dr]i\^)][^B9ri{X[dr]i\^)] 1 jj, f [D0gri{X\6)][DQgri{X\6)]^ \ 

M 2 .i=l l.r=l ^ I- ^r=l^ \- ^XfxW) -/ ' 

aNow, we can present the main result of tho section as follows. 

Theorem 4. Under the conditions of Lemma 0 the FI matrix of an unbalanced PROS sample about unknown 
parameters LI = {ct,9) is given by 

N Ui N rii r T T-\ / /^\]T ' 


I. 


upros 


(U) = lsrs{9) - E E ^ {Dl9r^{X■ 0)} + E E ^ 


[D0gr^{X-9)][Deg„[X - 9)y 
9r^{X-,9) 


i—1 r=l i=l r—1 

Table [TUI shows the FI content of unbalanced PROS samples compared with their SRS and RSS counterparts in the 
case of normal distribution and when iV = 1, S' = 6 and three subsets n = 3 of different sizes have been declared. 
The misplacement ranking error models are obtained following the model proposed in 
p e {0.25,0.5,0.75,0.9,1}. 


Dell and Clutter (1972) when 
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Table 10: Values of REi and RE 2 to compare the FI content of unbalanced PROS data with its SRS and RSS counterparts 
of the same size for normal distribution when S = 6 and n € {2, 3}. 


P 


D {<^17 ■ ■ • 1 } 

Design 

0.25 

0.50 

0.75 

0.90 

1.00 

{{1,2, 3, 4, 5}, {6}} 

REi 

RE 2 

1.134 

1.110 

1.823 

1.666 

3.094 

2.412 

4.754 

3.006 

8.026 

4.768 

{{1,2, 3, 4}, {5, 6}} 

REi 

RE 2 

1.038 

1.018 

1.151 

1.064 

1.343 

1.046 

1.510 

0.962 

1.613 

0.968 

{{1,2, 3}, {4, 5, 6}} 

REi 

RE 2 

1.020 

1.002 

1.095 

1.013 

1.271 

0.993 

1.513 

0.959 

2.507 

1.494 

{{1,2}, {3, 4, 5, 6}} 

REi 

RE 2 

1.040 

1.020 

1.198 

1.094 

1.361 

1.058 

1.547 

0.980 

1.597 

0.945 

{{1}, {2, 3,4, 5, 6}} 

REi 

RE 2 

1.137 

1.120 

1.796 

1.654 

3.170 

2.467 

4.748 

3.021 

8.175 

4.859 

{{1}, {2}, {3, 4, 5, 6}} 

REi 

RE 2 

1.071 

1.052 

1.485 

1.331 

2.196 

1.599 

2.927 

1.688 

3.389 

1.374 

{{1}, {2, 3}, {4, 5, 6}} 

REi 

RE 2 

1.169 

1.139 

1.444 

1.263 

2.259 

1.550 

3.261 

1.829 

5.810 

2.301 

{{1}, {2, 3,4}, {5, 6}} 

REi 

RE 2 

1.120 

1.079 

1.385 

1.228 

2.513 

1.738 

3.620 

2.063 

5.900 

2.411 

{{1},{2,3,4,5},{6}} 

REi 

RE 2 

1.204 

1.186 

2.039 

1.787 

4.263 

3.018 

7.090 

3.962 

16.439 

6.604 

{{1,2}, {3, 4, 5}, {6}} 

REi 

RE 2 

1.038 

1.004 

1.544 

1.373 

2.484 

1.761 

3.604 

2.023 

5.734 

2.278 

{{1,2}, {3, 4}, {5, 6}} 

REi 

RE 2 

1.032 

1.005 

1.158 

1.025 

1.453 

1.036 

1.865 

1.045 

3.785 

1.513 

{{1,2, 3}, {4}, {5, 6}} 

REi 

RE 2 

0.979 

0.932 

0.923 

0.813 

0.918 

0.652 

1.129 

0.642 

2.809 

1.127 

{{1,2, 3}, {4, 5}, {6}} 

REi 

RE 2 

0.994 

0.961 

0.939 

0.845 

0.946 

0.681 

1.089 

0.606 

2.874 

1.143 

{{1,2, 3, 4}, {5}, {6}} 

REi 

RE 2 

1.086 

1.077 

1.378 

1.203 

2.178 

1.553 

2.955 

1.685 

3.463 

1.386 
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